In this paper, we consider the net assignment problem in the logic emulation system. This problem is also known as the board-level-routing problem. There are field programmable logic arrays (FPGAs) and crossbars on an emulator board. Each FPGA is connected to each crossbar. Connection requests between FPGAs are called nets, and FPGAs are interconnected through crossbars. We are required to assign each net to the suitable crossbar. This problem is known to be NP-complete in general. A polynomial time algorithm is known for a certain restricted case, in which we treat only 2-terminal nets. In this paper we propose a new polynomial time algorithm for this case.
Koichi HAMAMOTO Hiroshi FUKETA Masanori HASHIMOTO Yukio MITSUYAMA Takao ONOYE
Body-biasing is expected to be a common design technique, and then area efficient implementation in layout has been demanded. Body-biasing outside standard cells is one of possible layouts. However in this case body-bias controllability, especially when forward bias is applied, is a concern. To investigate the controllability, we fabricated and measured a ring oscillator in a 90 nm technology. Our measurement result and evaluation of area efficiency reveal that body-biased circuits can be implemented with area overhead of less than 1% yet with sufficient speed controllability.
Yutaka MASUDA Takao ONOYE Masanori HASHIMOTO
Software-based error detection techniques, which includes error detection mechanism (EDM) transformation, are used for error localization in post-silicon validation. This paper evaluates the performance of EDM for timing error localization with a noise-aware logic simulator and 65-nm test chips assuming the following two EDM usage scenarios; (1) localizing a timing error occurred in the original program, and (2) localizing as many potential timing errors as possible. Simulation results show that the EDM transformation customized for quick error detection cannot locate electrical timing errors in the original program in the first scenario, but it detects 86% of non-masked errors potential bugs in the second scenario, which mean the EDM performance of detecting electrical timing errors affecting execution results is high. Hardware measurement results show that the EDM detects 25% of original timing errors and 56% of non-masked errors. Here, these hardware measurement results are not consistent with the simulation results. To investigate the reason, we focus on the following two differences between hardware and simulation; (1) design of power distribution network, and (2) definition of timing error occurrence frequency. We update the simulation setup for filling the difference and re-execute the simulation. We confirm that the simulation and the chip measurement results are consistent.
Nobuyuki IWANAGA Tomoya MATSUMURA Akihiro YOSHIDA Wataru KOBAYASHI Takao ONOYE
A sound localization method in the proximal region is proposed, which is based on a low-cost 3D sound localization algorithm with the use of head-related transfer functions (HRTFs). The auditory parallax model is applied to the current algorithm so that more accurate HRTFs can be used for sound localization in the proximal region. In addition, head-shadowing effects based on rigid-sphere model are reproduced in the proximal region by means of a second-order IIR filter. A subjective listening test demonstrates the effectiveness of the proposed method. Embedded system implementation of the proposed method is also described claiming that the proposed method improves sound effects in the proximal region only with 5.1% increase of memory capacity and 8.3% of computational costs.
Masashi OKADA Nobuyuki IWANAGA Tomoya MATSUMURA Takao ONOYE Wataru KOBAYASHI
In this paper, we propose a new 3D sound rendering method for multiple sound sources with limited computational resources. The method is based on fuzzy clustering, which achieves dual benefits of two general methods based on amplitude-panning and hard clustering. In embedded systems where the number of reproducible sound sources is restricted, the general methods suffer from localization errors and/or serious quality degradation, whereas the proposed method settles the problems by executing clustering-process and amplitude-panning simultaneously. Computational cost evaluation based on DSP implementation and subjective listening test have been performed to demonstrate the applicability for embedded systems and the effectiveness of the proposed method.
Shinya ABE Masanori HASHIMOTO Takao ONOYE
Influence of manufacturing variability on circuit performance has been increasing because of finer manufacturing process and lowered supply voltage. In this paper, we focus on mesh-style clock distribution which is believed to be effective for reducing clock skew, and we evaluate clock skew considering manufacturing and design variabilities. Considering MOS transistor variation -- random and spatially-correlated variation -- and non-uniform flip-flop (FF) placement, we demonstrate that spatially-correlated variation and severe non-uniform FF distribution can be major sources of clock skew. We also examine the dependency of clock skew on design parameters, and reveal that finer clock mesh does not necessarily reduce clock skew.
Hiroshi MIZUNO Hiroyuki KOBAYASHI Takao ONOYE Isao SHIRAKAWA
This paper devises a sophisticated approach to the performance estimation of an embedded hardware-software codesign system at the architecture level, which intends to optimize the hardware-software configuration in terms of processing time, power dissipation, and hardware cost. A distinctive feature of this approach consists in constructing a performance estimation model proper to each component of an embedded system, such as CPU core, RAM/ROM, cache memory, and application-specific hardware, by taking account of not only the functional performance but also the data transfer. The proposed estimation schemes are incorporated into an existing instruction set simulator, so that the actual performance can be estimated accurately at the architecture level. The experimental results demonstrate that the performance estimation approach enables the precise design decision at the architecture level, which greatly contributes toward enhancing the design ability dedicatedly for mobile appliances.
Hiroshi FUKETA Masanori HASHIMOTO Yukio MITSUYAMA Takao ONOYE
Timing margin of a chip varies chip by chip due to manufacturing variability, and depends on operating environment and aging. Adaptive speed control with timing error prediction is promising to mitigate the timing margin variation, whereas it inherently has a critical risk of timing error occurrence when a circuit is slowed down. This paper presents how to evaluate the relation between timing error rate and power dissipation in self-adaptive circuits with timing error prediction. The discussion is experimentally validated using adders in subthreshold operation in a 90 nm CMOS process. We show a trade-off between timing error rate and power dissipation, and reveal the dependency of the trade-off on design parameters.
Takehiko AMAKI Masanori HASHIMOTO Takao ONOYE
This paper presents an oscillator-based true random number generator (TRNG) that dynamically unbiases 0/1 probability. The proposed TRNG automatically adjusts the duty cycle of a fast oscillator to 50%, and generates unbiased random numbers tolerating process variation and dynamic temperature fluctuation. A prototype chip of the proposed TRNG was fabricated with a 65nm CMOS process. Measurement results show that the developed duty cycle monitor obtained the probability of ‘1’ 4,100 times faster than the conventional output bit observation, or estimated the probability with 70 times higher accuracy. The proposed TRNG adjusted the probability of ‘1’ to within 50±0.07% in five chips in the temperature range of 0°C to 75°C. Consequently, the proposed TRNG passed the NIST and DIEHARD tests at 7.5Mbps with 6,670µm2 area.
Ryo HARADA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
This paper presents two circuits to measure pulse width distribution of single event transients (SETs). We first review requirements for SET measurement in accelerated neutron radiation test and point out problems of previous works, in terms of time resolution, time/area efficiency for obtaining large samples and certainty in absolute values of pulse width. We then devise two measurement circuits and a pulse generator circuit that satisfy all the requirements and attain sub-FO1-inverter-delay resolution, and propose a measurement procedure for assuring the absolute width values. Operation of one of the proposed circuits was confirmed by a radiation experiment of alpha particles with a fabricated test chip.
Toshihiro MASAKI Yasuhiro NAKATANI Takao ONOYE Nariyoshi YAMAI Koso MURAKAMI
This paper presents novel multimedia ATM networks which are capable of transmitting voice data efficiently and unify the switching methods among heterogeneous traffic. Fully ATMized multimedia networks are using fellow cell switches. The proposed assembly method can pack plural calls which have different virtual channel connection (VCC) into one cell. Every call in cells is able to be dynamically rearranged by the fellow cell switch to achieve an efficient use of network resources. The switching functions are supported by shared virtual channel identifier (VCI) cells and fellow cells in it. The fellow cell switch for 622 Mbps links is integrated into a single chip. The multimedia ATM networks including voice transmission can be constructed by the fellow cell switches being attached to the standard ATM switches.
Motoki KIMURA Morgan Hirosuke MIKI Takao ONOYE Isao SHIRAKAWA
A Java execution environment is implemented, in which a hardware engine is operated in parallel with an embedded processor. This pair of hardware facilities together with an additional software kernel are devised for existing embedded systems, so as to execute Java applications more efficiently in such a way that 39 instructions are added to the original Java Virtual Machine to implement the software kernel. The exploration of design parameters is also attempted to attain a low hardware cost and high performance. The proposed hardware engine of a 6-stage pipeline can be integrated in a single chip using 30 k gates together with the instruction and data cache memories. The proposed approach improves the execution speed by a factor of 5 in comparison with the J2ME software implementation.
Xuzhen XIE Takao ONO Shin-ichi NAKANO Tomio HIRATA
A nearly equitable edge-coloring of a multigraph is a coloring such that edges incident to each vertex are colored equitably in number. This problem was solved in O(kn2) time, where n and k are the numbers of the edges and the colors, respectively. The running time was improved to be O(n2/k + n|V|) later. We present a more efficient algorithm for this problem that runs in O(n2/k) time.
Kosuke TOMITA Masahide HATANAKA Takao ONOYE
Viterbi decoding is commonly used for several protocols, but computational cost is quite high and thus it is necessary to implement it effectively. This paper describes GPU implementation of Viterbi decoder utilizing three-point Viterbi decoding algorithm (TVDA), in which the received bits are divided into multiple chunks and several chunks are decoded simultaneously. Coalesced access and Warp Shuffle, which is new instruction introduced are also utilized in order to improve decoder performance. In addition, iterative execution of parallel chunks decoding reduces the latency of proposed Viterbi decoder in order to utilize the decoder as a part of GPU-based SDR transceiver. As the result, the throughput of proposed Viterbi decoder is improved by 23.1%.
In this paper, we propose a proof scheme of shuffle, which is an honest verifier zero-knowledge proof of knowledge such as the protocols by Groth and Furukawa. Unlike the previous schemes proposed by Furukawa-Sako, Groth, and Furukawa, our scheme can be used as the shuffle of the elements encrypted by Paillier's encryption scheme, which has an additive homomorphic property in the message part. The ElGamal encryption scheme used in the previous schemes does not have this property.
Masahide HATANAKA Toshihiro MASAKI Takao ONOYE Koso MURAKAMI
This paper presents the switching control and VLSI architecture for the AAL2 switch. The ATM network with the AAL2 switch can efficiently transmit low-bit-rate data, even if the network has many endpoints. The switch is capable of not only switching AAL2 cells but also converting the header of other types of ATMs. The AAL2 switch is integrated into a single chip. The proposed ATM network is constructed by AAL2 switches attached to the ATM switches.
Daichi WATARI Ittetsu TANIGUCHI Francky CATTHOOR Charalampos MARANTOS Kostas SIOZIOS Elham SHIRAZI Dimitrios SOUDRIS Takao ONOYE
Energy management in buildings is vital for reducing electricity costs and maximizing the comfort of occupants. Excess solar generation can be used by combining a battery storage system and a heating, ventilation, and air-conditioning (HVAC) system so that occupants feel comfortable. Despite several studies on the scheduling of appliances, batteries, and HVAC, comprehensive and time scalable approaches are required that integrate such predictive information as renewable generation and thermal comfort. In this paper, we propose an thermal-comfort aware online co-scheduling framework that incorporates optimal energy scheduling and a prediction model of PV generation and thermal comfort with the model predictive control (MPC) approach. We introduce a photovoltaic (PV) energy nowcasting and thermal-comfort-estimation model that provides useful information for optimization. The energy management problem is formulated as three coordinated optimization problems that cover fast and slow time-scales by considering predicted information. This approach reduces the time complexity without a significant negative impact on the result's global nature and its quality. Experimental results show that our proposed framework achieves optimal energy management that takes into account the trade-off between electricity expenses and thermal comfort. Our sensitivity analysis indicates that introducing a battery significantly improves the trade-off relationship.
Fuma SAWA Yoshinori KAMIZONO Wataru KOBAYASHI Ittetsu TANIGUCHI Hiroki NISHIKAWA Takao ONOYE
Advanced driver-assistance systems (ADAS) generally play an important role to support safe drive by detecting potential risk factors beforehand and informing the driver of them. However, if too many services in ADAS rely on visual-based technologies, the driver becomes increasingly burdened and exhausted especially on their eyes. The drivers should be back out of monitoring tasks other than significantly important ones in order to alleviate the burden of the driver as long as possible. In-vehicle auditory signals to assist the safe drive have been appealing as another approach to altering visual suggestions in recent years. In this paper, we developed an in-vehicle auditory signals evaluation platform in an existing driving simulator. In addition, using in-vehicle auditory signals, we have demonstrated that our developed platform has highlighted the possibility to partially switch from only visual-based tasks to mixing with auditory-based ones for alleviating the burden on drivers.
Takao ONOYE Toshihiro MASAKI Isao SHIRAKAWA Hiroaki HIRATA Kozo KIMURA Shigeo ASAHARA Takayuki SAGISHIMA
The design procedure of a multithreaded processor dedicated to the image generation is described, which can be achieved by means of a high-level synthesis tool PARTHENON. The processor employs a multithreaded architecture which is a novel promising approach to the parallel image generation. This paper puts special stress on the high-level synthesis scheme which can simplify the behavioral description for the structure and control of a complex hardware, and therefore enables the design of a complicated mechanism for a multithreaded processor. Implementation results of the synthesis are also shown to demonstrate the performance of the designed processor. This processor greatly improves the throughput of the image generation so far attained by the conventional approach.